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ABSTRACT 


CSKYTI  is  an  out -of -core  profile  approach 
Cholesky  algorithm  equation  solver  for  large  sparse 
positive  definite  systems  of  linear  equations. 

CSKYTI  is  an  adaptation  of  an  existing  CDC  6000  series 
program,  CSKYDC2,  for  the  Texas  Instruments'  Advanced 
Scientific  Computer  (ASC  6)  at  the  Naval  Research 
Laboratory.  Tlie  preprocessor  progr.'im  SL.TIIP  for 
CSKYTI  was  similarly  obtained  from  CSKYlKiZ's  Sl.TIIP 
program.  Tliis  report  documents  a comiiarison  of  the 
performance  of  CSKYTI  and  its  SHTUP  on  the  ASC  6 
with  that  of  CSlCtT)G2  and  its  SITIT  on  the  CIX!  6600  at 
ITIYISRIXT  It  would  appear  that  the  ASC  6 requires  double 
precision  arithmetic  to  obtain  the  significance  of 
single  precision  computation  on  the  CDC  6600.  liven  so, 
the  ASC  6 was  found  to  be  fast  compared  to  the  CDC  6600. 


iNmonucTioN 


One  of  the  long  range  projects  of  the  Computation,  Mathematics,  and 
Logistics  Department  has  been  the  development  of  mathematical  subroutines 
suitable  for  use  in  the  computer-aided  structural  analysis  of  ships. 

Many  unrelated  efforts  in  both  government  and  industry  have  resulted  in 
computer  programs  that  treat  particular  classes  of  structural  problems. 
These  programs  often  involve  the  solution  of  similar  mathematical 
problems  but,  since  the  solutions  are  reached  independently,  the  efficiency 


and  accuracy  of  the  various  algorithms  used  may  vary  greatly.  The  need  to 
coordinate  these  diverse  efforts,  to  develop  improved  methods  of  more 
general  applicability,  and  to  produce  more  comprehensive  programs  for 
solving  Navy  structural  problems  became  obvious.  A project  was  therefore 
established  to  coordinate  research  efforts  involving  mathematical  and 
computational  methods  in  the  area  of  structural  mechanics  and  to  integrate 
the  work  of  mathematicians,  computer  specialists,  and  strvjctural 
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engineers  in  this  field. 

The  present  considerable  interest  in  the  finite  element  approach  to 
structural  analysis  is  evidenced  by  the  widespread  use  of  NASTRAN  (NAsa 
STRuctural  AXalysis  program)  and  other  such  programs.  According  to  the 
NiA.STRAN  Theoretical  Manual,^  "From  a theoretical  viewpoint,  the  formula- 
tion of  a static  structural  problem  for  solution  by  the  displacement 
method  is  completely  described  by  the  matrix  equation  KU=P."  Thus,  there 
is  a need  for  accurate  efficient  computer  subroutines  capable  of  solving 
these  large  sparse  positive  definite  systems  of  simultaneous  linear 
equations.  However,  the  order  of  K is  often  so  large  that,  even  when 
advantage  is  taken  of  K's  symmetry  and  banded  structure,  it  is  not  feasible, 
and  sometimes  not  even  possible,  to  store  K in  the  core  memory  of  a 
computer. 

To  handle  such  a case  the  computer  program  CSKYPG2  was  developed. 

CSKYncZ  is  an  out -of -core  Cholesky  algorithm  equation  solver  which  takes 

2 

the  profile  approach.  CSKYDG2  makes  use  of  SFTUP,  a preprocessor  program 
which  stores  the  profile  of  the  upper  triangular  half  of  the  matrix  K in 
random  access  files.  CSKYDG2  and  SETUP  were  written  in  FORTRAN  Extended 
for  the  cue.  6000  series  of  computers.  The  present  program  CSiOTl  was 
obtained  by  maiifying  CSKYDG2  so  as  to  take  as  much  advantage  as  feasible 
of  the  special  features  of  the  Texas  Instnjments'  Advanced  Scientific 
(‘omputer  (TI  ASC  6)  at  the  Naval  Research  Laboratory.  The  .SETUP  pre- 
processor program  for  CSKYTI  was  obtained  similarly  from  the  SETUP  program 
for  CSKYTX12.  This  conversion  was  undertaken  chiefly  to  investigate  the 
effectiveness  of  the  A.SC's  "pipe  line"  architecture  with  respect  to  the 
"vcctorization"  of  computer  code  by  comparing  the  performance  of  the  two 
versions  of  the  same  program  on  the  ASC  and  the  CDC  6600,  respectively. 


'The  NASTRAN  Theoretical  Manual,"  edited  by  R.H.  MacNeal,  National 
Aeronautics  and  Space  Administration,  Washington,  D.C.,  (1969)  p.  3.1-1. 
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Gignac,  D.A. , "CSKYDG2;  An  Out-of-Core  Cholesky  Algorithm  Equation 
.Solver  (with  Respect  to  Profile)  for  Large  Positive  Definite  Systems  of 
Linear  Fxjuations,"  Naval  Ship  Research  & Development  Center  Report  4655 
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MATHEMATICAL  DETAILS 


♦ 


DEFINITION  OF  TERMS 

• Positive  Definite 

A real  symmetric  matrix  A of  order  n is  said  to  be  "positive 

T 

definite"  if  its  associated  quadratic  form  X AX  is  positive  for  all 
vectors  X other  than  the  null  vector  0.  A is  also  characterized  by  the 
fact  that  all  its  eigenvalues  are  positive.  (The  reader  will  remember 
that  all  the  eigenvalues  of  a real  symmetric  matrix  are  real.)  Clearly  A 
is  also  nonsingular,  that  is,  A ^ exists.  A system  of  simultaneous 
linear  equations  in  n unknowns,  AX=B,  is  said  to  be  positive  definite  if 
A is  positive  definite.  In  theory,  such  a system  can  never  be  singular, 
though  numerically  it  may  turn  out  to  be  nearly  so. 

• Bandwidth 

A real  symmetric  matrix  A of  order  n is  said  to  be  banded  and 
have  a bandwidth  of  m where  1 - m - n if  for  >ni.  The 

quantity  m is  sometimes  also  called  the  semi -bandwidth  of  A. 

• Profile 

The  "profile"  of  a real  symmetric  matrix  A is  the  upper  triangular 
half  of  A stored  columnwise  from  the  diagonal  element  of  the  column  to  the 
last  non -zero  element  of  that  column. 

• Active  Column 

•An  "active  column"  of  a real  symmetric  matrix  A is  a column 
which  results  in  an  undesirably  large  bandwidth. 

aiOLESKY  ALCX3RITHM 

The  (Tiolesky  algorithm  solution  of  a positive  definite  system  of 
simultaneous  linear  equations  AX=B  consists  of  two  steps: 

• The  factorization  of  A into  the  product  of  a lower  triangular 
matrix  S and  its  transpose  T --  that  is,  finding  S such  that  A=ST.  This 
is  the  Cholesky  decomposition  of  A.  S and  T are  the  unique  Cholesky 
factors  of  A. 

• The  joint  solution  of  the  equivalent  pair  of  triangular  systems 


.3 


SY  = B 
TX  = Y 

the  first  of  which  is  easily  solved  by  forward  substitution  and  the  second 
by  backward  substitution. 

The  Cholesky  algorithm  is  both  accurate  and  efficient.  It  is  also 
quite  stable  and  does  not  require  pivoting. 

Making  use  of  symmetry  we  may  rewirte  the  usual  formulas  for  the 
columnwise  Cholesky  decomposition  of  A entirely  in  terms  of  T. 
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The  usual  formulas  for  the  forward  substitution  in  terms  of  T are 


yi 

^i 


'iK 


i-1 

j;  t,  . b, ) i=2,...,n 

k=l 


The  usual  formulas  for  the  backward  substitution  in  terms  of  T are 
■1 


X = t y 
n nn-^n 


n 

Xi  = ^ ^ 

^ ^ k=it-l 


examination  of  these  formulas  leads  to  the  following  observations; 

• It  is  more  efficient  to  incorporate  the  forward  substitution 
into  the  Cholesky  decomposition,  provided  that  one  does  not  wish  to  solve 
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AX=B  for  several  right-hand  sides  h with  the  same  left-hand  side  A. 

• Fonnulas  can  be  modified  to  take  advantage  of  the  fact  that 
A and  T have  the  same  profile. 

I {YPnRM/\TR  1 X AP  PROAO  \ 


.n^  fwhere 


The  hypermatrix  approach  consists  in  regarding  the  elements  of  a 
matrix  A as  some  partitioning  set  of  submatrices  A^j  rather  than  the 
scalars  a^j  . For  example,  if  n,  the  order  of  A,  equals  nj^+n2+. 
the  n^  are  integers) , then  A can  be  regarded  as  a "hypermatrix”  of  order  I 
whose  elements  are  the  obvious  partitioning  set  of  H^n^  xn^  submatriccs. 
"Hypervectors"  are  defined  similarly.  Tlius  the  system  AX=B  defines  a 
hv^permatrix  system  A'X'=B'  of  order  i. 

In  extending  the  Cholesky  algorithm  to  hypermatrices  and  replacing 


the  scalars  a. 

1 

following; 


bv  the  submatfices  A. . in  the  above  formulas,  note  the 

ij 


• Matrix  multiplication,  unlike  scalar  multiplication  is  not 
commutative. 

• Scalar  division  is  replaced  by  the  solution  of  a triangular 
system. 

• Fhe  extraction  of  square  roots  is  replaced  by  Choleslcy 
decomposition. 

The  formulas  for  the  hypermatrix  Qiolesky  decomposition  in  terms  of  the 
submatrices  of  T become 
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The  fonmilas  for  the  hypermatrix  forward  substitution  are 


Y. 

1 


i-1 


^ 


The  formulas  for  the  h\permatrix  backward  substitution  are 


X = T'^  Y 
n nn  n 


■ k=Ll 


i=n-l .... ,1 


Note  that  the  positive  definiteness  of  A guarantees  that  those  matrices 
which  must  be  invertible  or  positive  definite  according  to  the  above 
formulas  are,  in  fact,  invertible  or  positive  definite.  This  statement  is 
so  easily  established  as  not  to  require  formal  proof  here. 


SliTUP  A.NT)  CSKTll 


Sr.'lTJI''  is  3 preprocessor  subroutine  which  packs  the  profile  of  A in 
QDAM  ((Jued  Direct  Access  Method)  files  on  tape  1,  according  to  the 
specified  partition.  The  STTUT  subroutine  first  reads  the  order  of  A and 
then  reads  the  rows  of  the  upper  triangular  half  of  A from  tape  2.  Idie 
Simil’  subroutine  has  four  arguments.  The  partition  is  specified  in  the 
first  two  arguments,  NS  and  NNiS,  where  NS  is  the  partition  vector  and  NXS 
is  the  number  of  partitions.*  fhe  profile  is  characterized  in  the  output 
of  the  last  two  arguments  with  respect  to  the  given  partition.  Tlie  lASTBl.K 
array  contains  the  locations  of  the  diagonal  submatrices  of  the  h>pcrm;itrix. 
ITie  NINIM.KS  array  specifies  the  number  of  submatrices  stored  from  each 
column  of  the  hypermatrix. 

* 

IMl’ORTA.NTl:  It  must  be  pointed  out  to  the  reader  that  the  prognun 
published  in  this  report  expects  a uniform  partition  mesh  of  10  except 
for  the  NNSt*^  partition  whoso  width  is  i , 1 ^ i < 10.  This  limitation  was 
introduced  to  facilitate  the  use  of  QDAM. 
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The  CSICiTI  subroutine  effects  a Cholesky  algorithm  solution  of  a 
system  of  linear  equations  via  the  profile  approach,  using  the  submatrices 
packed  in  random  access  memory  by  Ilie  CSKYTl  subroutine  has  five 

arguments,  XNS,  NS,  lASTBLK,  NUMBLKS,  and  B --  the  first  four  as  already 
described.  The  right-hand  side  of  a system  of  linear  equations  is  passed 
through  the  B array  and  the  solution  of  the  system  is  returned  in  the 
same  array. 

As  noted  previously  SliTUP  and  CSKATl  a.'-e  AST  6 adaptations  of  programs 
written  for  the  CDC  6000  series  of  compute  -s. 


mi;  Ti;sT  i:xANiid.T.s 

FXkNiri.i;  1 

The  matrix  f.'unily  of  Table  1,  , is  generated  as  follows;  I.et  N 

be  an  integer  ^ 7>.  I.et  be  the  tridiagonal  of  order  N with  4's  on  the 
diagonal  and  a line  of  -I's  above  and  below  the  diagonal.  I.et  I,^,  be  the 
identity  matrix  of  order  N.  An  (\+l) -banded  matrix  of  order  N^ , is 
const nic ted  by 

fl)  stringing  N T.^.  submatrices  along  the  diagonal, 

(2)  inserting  lines  of  N-1  -I^  submatrices  above  and  below  the 
d i agona 1 , and 

(.3)  setting  the  remaining  elements  of  equal  to  0. 

.Application  of  Gerschgor in's  theorem  shows  A^,j  to  be  positive  definite. 

The  right-hand  side  of  the  system  i-''  chosen  such  that  all 

components  of  the  exact  solution  vector  have  the  value  1. 

i;XAMId.i;  2 

The  matrix  family  of  Table  2,  A^2 > generated  as  follows:  Let  N be 
an  integer  - 7>.  Assume  the  real  symmetric  matrix  of  order  and  bandwidth 
N with  N^  on  the  diagonal  and  -1  elements  filling  out  the  rest  of  the  band. 
Then  change  the  value  of  each  zero  element  in  the  last  N rows  and  columns 
to  a -1.  As  before,  Gerschgorin's  theorem  shows  A^2  he  positive 
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definite.  Note  that  from  the  viewpoint  of  bandwidth,  /\^,p  is  a full 
matrix.  The  right-hand  side  of  the  system  A^.?X  = i.'s  chosen  such  that 
all  the  components  of  the  exact  solution  vector,  except  the  last,  which  is 
1,  arc  ccro. 


I 


r 


TABI.i;  NOTATION 


1 2 

ilic  solution  times  for  .\jX  = and  /'y^X  = arc  tabulated  in  Tables 

1 and  2.  respectively.  Tliesc  test  problems  were  originally  run  on  the 

CIXI  6600  at  PTNSRIX'  u'^ing  the  OPT=l  FORTR/VN  hxtended  compiler  under  the 

•> 

scon.  S..S  operating  system. “ They  were  rerun  on  the  .-VSC  6 at  NRl,  using 
the  NX  I'ORTRVN  compiler  under  the  Ccncral  niq-iosc  (''perating  System  (CPOS). 
Tlic  column  headings  arc  dcfincti  as  follows: 

N - the  order  of  the  system, 

M - system  bandwiilth, 

NS  - tlic  partition  vector, 

NN;  - the  order  of  the  h>T>ermatrix  system. 

.'■M  - Ipjicrmatrix  system's  biindwidth, 

'srniP  " time  required  to  pack  the  hqicrmatrix  system  into 

r;indom  access  mass  storage, 

Isom  ' tine  required  to  solve  the  Ipjicmiatr ix  system,  and 


huiAl, 


the  sum  of  the  lsoi\q:  tinics. 


fThc  above  times  are  given  in  terms  of  CPU  seconds.) 


lABIi;  : - A U1M}’AR1S0\  01  OSKVl  1 .WD  CSKYlXiJ  h.Ml.lT 


OBSERVATIONS 


At  the  outset  it  was  suspected  that  ASC  single  precision  computation 
(32-bit  word)  could  not  provide  solution  significance  comparable  to  that 
of  the  ox;  6000  series  (60 -bit  word).  This  was  found  to  be  the  case. 
However,  since  the  ASC  double  precision  word  is  slightly  more  significant 
(64  bits)  than  the  CDC  60-bit  word,  maintaining  significance  proved  to 
be  no  problem,  all  the  more  so  since  the  ASC  user  is  not  unduly  penalized 
for  resorting  to  double  precision  arithmetic  to  maintain  significance. 
(Changing  from  single  to  double  precision  arithmetic  on  the  CDC  6000 
series  almost  quadruples  the  running  time  of  a given  program.)  Both 
CSKYTI  and  CSKYDG2  provided  the  same  significance- -at  least  12  to  13 
significant  digits--for  the  esamples  in  this  report. 

Clearly  the  improvement  in  computing  time  can  only  be  described  as 
phenomenal.  The  ASC  version  of  SETUP  runs  up  to  31  times  faster  than 
the  CDC  6600  version,  CSKYTI  runs  about  5 times  faster  than  CSKYDG2  and 
the  total  time  may  be  as  much  as  17  times  faster.  The  remarkable  improve- 
ment in  SETUP'S  running  time  results  from  the  more  efficient  reading  and 
KTiting  of  scratch  tape  2 (which  is  really  on  a disk)  by  the  head -per - 
track  disk  drives  of  the  ASC.  The  considerable  improvement  in  CSKYTI 's 
running  time  over  than  of  CSKYDG2  is  no  doubt  due  to  the  fair  amount  of 
vectorization  provided  by  the  NX  FORTRAN  so  as  to  exploit  the  "pipeline" 
architecture  of  the  ASC. 
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(1)  DTNSRDC  RETOKTS.  A FORMAL  SERIES  PUBLISHING  INFORMATION  OF 
PERMANENT  TECHNICAL  VALUE.  DESIGNATED  BY  A SERIAL  REPORT  NUMBER. 


<21  DEPARTMENTAL  REPORTS.  A SEMIFORMAL  SERIES.  RECORDING  INFORMA 
TION  OF  A PRELIMINARY  OR  TEMPORARY  NATURE,  OR  OF  LIMITED  INTEREST  OR 
SIGNIFICANCE.  CARRYING  A DEPARTMENTAL  ALPHANUMERIC  IDENTIFICATION. 

(31  TECHNICAL  MEMORANDA.  AN  INFORMAL  SERIES.  USUALLY  INTERNAL 
INORKING  PAPERS  OR  DIRECT  REPORTS  TO  SPONSORS.  NUMBERED  AS  TM  SERIES 
I REPORTS;  NOT  FOR  GENERAL  DISTRIBUTION. 
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